Quantifying biodiversity starts with a sample (or multiple samples) of a biological assemblage. The size (e.g., area or volume) of the sample is often referred to as the sample grain. For example, we might use a 1m2 quadrat to sample the benthic community on a coral reef.


Or use a line intercept transect of a set length (e.g., 50m).

Then, choose a metric…but, there are lots to choose from!

The proliferation of metrics is likely due to the multicomponent nature of biodiversity. Different metrics describe (incorporate) three different components - abundance, richness, and evenness - to varying degrees (McGill 2011b). For diversity considered at only a single scale only, changes in these three components combine to determine variation in biodiversity. We want to choose metrics that allow us to understand and accurately describe how numbers of individuals, as well as the richness and relative abundance of species covary.

Biodiversity through the lens of Individual-Based Rarefaction (IBR) curves

library(tidyverse)
library(mobsim)
library(mobr)
library(cowplot)

# set a RNG seed 
set.seed(42)

# simulate a sample with 20 species and 200 individuals, and 
# a lognormal SAD
sim_poisson_comm <- sim_poisson_community(s_pool = 20,
                                          n_sim = 200,
                                          sad_type = 'lnorm',
                                          sad_coef = list('meanlog' = log(200/20),
                                                          'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm1 <- sim_poisson_comm$census %>% 
  as_tibble()

# need counts of each species for Species Abundance Distribution (and plotting 
# rarefaction curve)
comm1_long <- comm1 %>% 
  group_by(species) %>% 
  summarise(N = n())

# calculate IBR, and put it in a tibble (dataframe) for plotting  
comm1_ibr <- tibble(expected_richness = rarefaction(comm1_long$N, method = 'IBR')) %>% 
  mutate(individuals = 1:n())

# calculate the Probability of Interspecific Encounter (PIE)
comm1_PIE <- tibble(PIE = calc_div(comm1_long$N, index = 'PIE'))

# plot the sample and the IBR
comm1_map <- ggplot() +
  geom_point(data = comm1,
             aes(x = x, y = y, colour = species)) +
  scale_color_viridis_d() +
  theme_minimal() +
  theme(legend.position = 'none',
        axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        panel.border = element_rect(fill = NA, colour = '#ff0000'))

comm1_ibr_plot <- ggplot() +
  geom_line(data = comm1_ibr,
            aes(x = individuals, y = expected_richness),
            linewidth = 1.1) +
  geom_point(aes(x = 1, y = 1),
             size = 2) +
  geom_text(aes(x = 35, y = 1, label = '{J = 1, S = 1}'),) +
  geom_point(data = comm1_ibr,
             aes(x = max(individuals), y = max(expected_richness)),
             size = 2) +
  geom_text(data = comm1_ibr %>% 
              # filter so as we only have the last row
              filter(individuals==max(individuals)),
            aes(x = individuals - 35, y = expected_richness - 3,
                label = paste0('{J = ', max(individuals), ', S = ',
                              max(expected_richness), '}'))) +
  # add a line to show PIE; I've got it going through the origin here,
  # but it is the slope of the IBR as we move from one individual to two
  geom_abline(data = comm1_PIE,
              aes(intercept = 0, slope = PIE),
              linewidth = 0.75, alpha = 0.5, linetype = 2) +
  geom_text(data = comm1_PIE,
            aes(x = 35, y = 17, label = paste0('PIE = ', round(PIE,2)))) +
  labs(x = 'Total number of individuals (J)',
       y = 'Expected species richness (S)') +
  theme_bw() 


plot_grid(comm1_map,
          comm1_ibr_plot, 
          nrow = 1)

We can visualise the key components of individuals, richness and evenness using the individual-based rarefaction curve. Numbers of individuals are on the x-axis, and the number of species on the y-axis. Evenness, or the relative abundance of species, is described by the shape of the curve. In particular, the slope at the base of the rarefaction curve, as it moves from one individual to two, is equal to the Probability of Interspecific Encounter (PIE;(Hurlbert 1971; Olszewski 2004). In practice, PIE is often transformed to an Effective Number of Species (ENS) to aid interpretation (L. Jost 2006). The ENS transformation of PIE, which we refer to as \(S_{PIE}\), is equal to the inverse of Simpson’s index (\(S_{PIE} = 1/\sum_{i=1}^{S} p_i^2\)), where \(p_i\) is the relative abundance of species i. \(S_{PIE}\) is also equal to Hill number of diversity index with order equal to two (Hill 1973; Lou Jost 2007).

Exercises

Simulate some data to investigate covariation in abundance, evenness and richness.

#--------set parameters for community abundance, richness and evenness-----
community_params <- expand_grid(
  # number of species
  Spool = c(10, 40, 80), 
  # total number of individuals
  J = c(50, 500),
  # evenness of species relative (log) abundance
  # smaller values result in more even distributions
  # of species relative abundance
  sdlog = c(0.8,2)) %>% 
  mutate(
    # species mean (log) abundance
    meanlog = log(J/Spool)
  ) %>% 
  # create identifier
  mutate(region = 1:n()) 


#--------simulate communities-------------------
communities <- community_params %>% 
  group_by(region) %>%
  nest(data = c(Spool, J, sdlog, meanlog)) %>%
  # create some replicates (multiple samples with the same parameters)
  uncount(20, .remove = FALSE) %>% 
  ungroup() %>% 
  mutate(sim = 1:n()) %>% 
  # generate sample with individuals randomly distributed in space
  # i.e., spatial sampling is poisson process
  mutate(poisson_comm = map(data, 
                            ~sim_poisson_community(
                              s_pool = .x$Spool,
                              n_sim = .x$J,
                              sad_type = 'lnorm',
                              sad_coef = list('meanlog' = .x$meanlog,
                                              'sdlog'= .x$sdlog)))) %>% 
  # prepare stem map for visualisation 
  mutate(poisson_stem_map = map(poisson_comm, 
                                ~tibble(x = .x$census$x,
                                        y = .x$census$y,
                                        species = .x$census$species))) %>% 
  # convert sample to SAD for further calculations
  mutate(poisson_sad = map(poisson_comm, ~community_to_sad(.x)),
         # to 2d
         poisson_sad = map(poisson_sad, ~tibble(species = names(.x),
                                                N = as.numeric(.x)))) %>% 
  # calculate IBR
  mutate(poisson_ibr = map(poisson_sad, ~rarefaction(x = .x$N,
                                                     method = 'IBR')),
         # wrangle for plotting
         poisson_ibr = map(poisson_ibr, ~tibble(individuals = 1:length(.x),
                                                expected_richness = as.numeric(.x)))) 

# examine abundance, richness and evenness for the different communities
metrics <- communities %>% 
  select(sim, region, data, poisson_sad) %>% 
  unnest(c(poisson_sad)) %>% 
  group_by(sim, region) %>% 
  summarise(J = sum(N),
            S = calc_div(N, index = 'S'),
            PIE = calc_div(N, index = 'PIE'),
            S_PIE = calc_div(N, index = 'S_PIE')) %>% 
  ungroup() %>% 
  # put the known parameters back in for visualisation
  left_join(community_params)
# plot ibr
communities %>% 
  unnest(c(data, poisson_ibr)) %>% 
  select(sim, region, individuals, expected_richness) %>% 
  mutate(spatial_dist = 'random') %>% 
  left_join(community_params, by = 'region') %>% 
  mutate(region_label = paste0('S = ',Spool,
                               ', J = ', J,
                               ', sd = ', sdlog)) %>% 
  ggplot() +
  facet_wrap(~region_label, scales = 'free') + 
  geom_line(aes(x = individuals, y = expected_richness, 
                group = sim)) +
  labs(y = 'Expected number of species',
       x = 'Number of individuals') +
  theme(legend.position = 'none')

# plot metrics
ggplot() +
  geom_boxplot(data = metrics,
               aes(x = as.factor(J), y = S,
                   group = region, colour = as.factor(sdlog))) +
  scale_colour_manual(name = 'Evenness',
                      values = c('0.8' = '#ffa600',
                                  '2' = '#003f5c'),
                      labels = c('0.8' = 'more even',
                                 '2' = 'less even')) +
  labs(y = 'Species richness',
       x = 'Number of individuals')

But, scale.

Unfortunately, our simplification to three components is only useful for biodiversity accounting at one scale. The shape of the rarefaction curve is again a useful, albeit limited, abstraction of the difficulty ahead: biodiversity scales non-linearly (Arrhenius 1921; Rosenzweig 1995). The individual-based rarefaction curve is often limited because with sufficient sampling effort on the x-axis, the curve cab bend upwards again, e.g., the Species Area Relationship (SAR) can have bend upwards at very large spatial scales (Rosenzweig 1995), sometimes referred to as a triphasic SAR. The nonlinear scaling of biodiversity is well known, but not necessarily well understood, particularly the potential for non-linear scaling to impact estimates of biodiversity change at different scales. One way to approach the non-linear scaling is via autocorrelation or within species aggregation (McGill 2011a).

Here we will approach the scale-dependence of biodiversity using a diversity partition introduced by Whittaker (1960) that considers two discrete spatial scales and another to relate the two (Chase et al. 2018). Specifically, the larger (sometimes referred to as the regional) scale is often called the gamma (\(\gamma\)) scale, and the smaller (local) scale is called the alpha (\(\alpha\)) scale (and the mean \(\alpha\)-diversity is \(\bar{\alpha}\)). Whittaker’s diversity partition assumes a multiplicative relationship between the two scales:

\[\gamma = \bar{\alpha} * \beta,\] where \(\beta\)-diversity describes the variation among samples in species composition.

By making scale discrete, we can continue to use individual-based rarefaction curves to visualise and describe biodiversity. We’ll consider each sample as representing the \(\alpha\)-scale. And then combine multiple samples to create the \(\gamma\)-scale. The area encompassed by all samples combined is often referred to as the sample extent.

# simulate a community with 200 species and 5000 individuals, and 
# a lognormal SAD, and individuals randomly distributed in space
Spool <- 200
Jpool <- 5000
sim_poisson_comm <- sim_poisson_community(s_pool = Spool,
                                          n_sim = Jpool,
                                          sad_type = 'lnorm',
                                          sad_coef = list('meanlog' = log(Spool/Jpool),
                                                          'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm1 <- sim_poisson_comm$census %>% 
  as_tibble()

# simulate quadrat sampling from the community
quad_area <- 0.01

samps <- mobsim::sample_quadrats(comm = sim_poisson_comm,
                n_quadrats = 5, 
                quadrat_area = quad_area,
                method = 'random',
                avoid_overlap = FALSE,
                plot = FALSE)

quad_xy <- samps$xy_dat %>% 
  as_tibble() %>% 
  rownames_to_column(var = 'site')

# plot samples
plot_samples <- ggplot() +
  geom_point(data = comm1,
             aes(x = x, y = y, colour = species)) +
  geom_rect(data = quad_xy,
            aes(xmin = x, ymin = y,
                xmax = x + sqrt(quad_area),
                ymax = y + sqrt(quad_area)),
            fill = alpha('grey',0),
                colour = '#ff0000',
            linewidth = 1.1) +
  scale_color_viridis_d() +
  theme_minimal() +
  theme(legend.position = 'none',
        axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        panel.border = element_rect(fill = NA, colour = 'black'))


# need counts of each species for Species Abundance Distribution (and plotting 
# rarefaction curve). Our samples are in a site x species matrix (sometimes
# called wide data [cf. long data]).
alpha_long <- samps$spec_dat %>% 
  as_tibble() %>% 
  rownames_to_column(var = 'site') %>% 
  pivot_longer(cols = !site,
               names_to = 'species',
               values_to = 'N')

# calculate IBR, and put it in a tibble (dataframe) for plotting  
alpha_ibr <- alpha_long %>% 
  group_by(site) %>% 
  nest(data = c(species, N)) %>% 
  mutate(ibr = map(data, ~rarefaction(.x$N, method = 'IBR'))) %>% 
  unnest(ibr) %>% 
  mutate(individuals = 1:n()) %>% 
  ungroup()

# to get the gamma-scale curve, we need to aggregate the species counts
# across species
gamma_long <- alpha_long %>% 
  group_by(species) %>% 
  summarise(N = sum(N)) %>% 
  ungroup()

gamma_ibr <- tibble(gamma_richness = rarefaction(gamma_long$N, method = 'IBR')) %>% 
  mutate(individuals = 1:n())

# plot rarefaction curves
ibr_2scales <- ggplot() +
  geom_line(data = alpha_ibr,
            aes(x = individuals, y = ibr, group = site, colour = 'alpha')) +
  geom_line(data = gamma_ibr,
            aes(x = individuals, y = gamma_richness, colour = 'gamma')) +
  scale_colour_manual(name = 'Scale',
                      values = c('alpha' = '#025c00',
                                 'gamma' = '#ffc619'),
                      labels = c(expression(alpha),
                                 expression(gamma))) +
  labs(x = 'Number of individuals',
       y = 'Expected number of species') +
  theme_bw() +
  theme(legend.position = c(0.99,0.01),
        legend.justification = c(1,0))

plot_grid(plot_samples,
          ibr_2scales,
          nrow = 1)

We can use Whittaker’s diversity partition to calculate \(\beta\)-diversity for these data (i.e., \(\beta = \frac{\gamma}{\bar{\alpha}}\). Before doing so, what does it means when the \(\alpha\)-scale curves are so closely aligned with the \(\gamma\)-scale curve? The overlapping rarefaction curves means that the smaller (\(\alpha\)) scale samples are effectively a random subsample of the larger (\(\gamma\)) scale (Blowes, Belmaker, and Chase 2017; Chase et al. 2018). We know this to be true for these simulated data: the individuals of all species were distributed randomly in space (via a poisson point process model). Before diving into some beta-diversity metrics, let’s look at how within species aggregation changes what we see.

# simulate a community with 200 species and 5000 individuals, and 
# a lognormal SAD (i.e., same as previous community), only now individuals will
# have an aggregated distribution in space
# Spool <- 200
# Jpool <- 5000

# see ?sim_thomas_community(): how is the spatial distribution of species
# controlled in this function?
sim_aggr_comm <- sim_thomas_community(s_pool = Spool,
                                      n_sim = Jpool,
                                      sad_type = 'lnorm',
                                      sad_coef = list('meanlog' = log(Spool/Jpool),
                                                      'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm2 <- sim_aggr_comm$census %>% 
  as_tibble()

# here, well use a quadrat to sample individuals from the community,
# set area of quadrat
quad_area <- 0.01

samps2 <- mobsim::sample_quadrats(comm = sim_aggr_comm,
                n_quadrats = 5, 
                quadrat_area = quad_area,
                # other spatial distributions for quadrats are available:
                # grids and transects
                method = 'random',
                avoid_overlap = FALSE, # setting this to true breaks currently!
                plot = FALSE)

# xy_dat has the xy coordinates for the lower left corner of every quadrat sample
quad_xy <- samps2$xy_dat %>% 
  as_tibble() %>% 
  rownames_to_column(var = 'site')

# plot the community and the location of the quadrat samples
plot_samples <- ggplot() +
  geom_point(data = comm2,
             aes(x = x, y = y, colour = species)) +
  geom_rect(data = quad_xy,
            aes(xmin = x, ymin = y,
                xmax = x + sqrt(quad_area),
                ymax = y + sqrt(quad_area)),
            fill = alpha('grey',0),
                colour = '#ff0000',
            linewidth = 1.1) +
  scale_color_viridis_d() +
  theme_minimal() +
  theme(legend.position = 'none',
        axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        panel.border = element_rect(fill = NA, colour = 'black'))


# We need counts of each species for Species Abundance Distribution (and plotting 
# rarefaction curve). Our samples are in a site x species matrix (sometimes
# called wide data [cf. long data]).

# wide to long (for plotting)
alpha_long2 <- samps2$spec_dat %>% 
  as_tibble() %>% 
  rownames_to_column(var = 'site') %>% 
  pivot_longer(cols = !site,
               names_to = 'species',
               values_to = 'N')

# calculate IBR, and put it in a tibble (dataframe) for plotting  
alpha_ibr2 <- alpha_long2 %>% 
  group_by(site) %>% 
  nest(data = c(species, N)) %>% 
  mutate(ibr = map(data, ~rarefaction(.x$N, method = 'IBR'))) %>% 
  unnest(ibr) %>% 
  mutate(individuals = 1:n()) %>% 
  ungroup()

# to get the gamma-scale curve, we need to aggregate the individual counts for 
# each species
gamma_long2 <- alpha_long2 %>% 
  group_by(species) %>% 
  summarise(N = sum(N)) %>% 
  ungroup()

# calculate the gamma-scale IBR
gamma_ibr2 <- tibble(gamma_richness = rarefaction(gamma_long2$N, method = 'IBR')) %>% 
  mutate(individuals = 1:n())

# plot rarefaction curves (both scales)
ibr_2scales2 <- ggplot() +
  geom_line(data = alpha_ibr2,
            aes(x = individuals, y = ibr, group = site, colour = 'alpha')) +
  geom_line(data = gamma_ibr2,
            aes(x = individuals, y = gamma_richness, colour = 'gamma')) +
  scale_colour_manual(name = 'Scale',
                      values = c('alpha' = '#025c00',
                                 'gamma' = '#ffc619'),
                      labels = c(expression(alpha),
                                 expression(gamma))) +
  labs(x = 'Number of individuals',
       y = 'Expected number of species') +
  theme_bw() +
  theme(legend.position = c(0.99,0.01),
        legend.justification = c(1,0))

plot_grid(plot_samples,
          ibr_2scales2,
          nrow = 1)

Now, we see that the \(\gamma\)-scale curve does not fall directly on top of the \(\alpha\)-scale curves. This means that the smaller scale samples no longer capture a random subset of the species captured by all the samples combined. This is due to within species aggregation. Environmental conditions are heterogeneous in nature, and the conditions our samples document will typically be more similar when the samples are in close proximity to each other (in space &/or time). Because species often have idiosyncratic responses to environmental heterogeneity (species may have different niches (Chase and Leibold 2003) for example), and other processes that influence the spatial distribution of individuals (e.g., dispersal) likely vary among species (and at the very least are stochastic to some degree for individuals), most species exhibit some degree of within species aggregation. So, communities - and hopefully our samples that we take to represent them - likely fall somewhere between these two extremes (of closely overlapping accumulation curves at the \(\alpha\)- and \(\gamma\)-scales, and strongly diverging).

When we embrace the scale-dependent nature of biodiversity variation there are four components that we’d like our biodiversity metrics to accurately describe: abundance, evenness, richness, and within species aggregation (Chase et al. 2018; McGlinn et al. 2019). Changes in all of these components underpin variation in diversity. We stand to gain new insights with an accurate accounting of how they combine across scales.

Exercise: Biodiversity accounting computer lab

Use mobsim to simulate sampling from some different communities. Vary community size (total abundance), the size of the species pool, evenness, and aggregation, but take the same number of samples from each community. Plot rarefaction curves and calculate some metrics to explore the relationship between different components at the \(\alpha-\) and \(\gamma\)-scales.

References

Arrhenius, Olof. 1921. “Species and Area.” Journal of Ecology 9 (1): 95–99.
Blowes, Shane A., Jonathan Belmaker, and Jonathan M. Chase. 2017. “Global Reef Fish Richness Gradients Emerge from Divergent and Scale-Dependent Component Changes.” Proceedings of the Royal Society B: Biological Sciences 284 (1867): 20170947.
Chase, Jonathan M., and Mathew A Leibold. 2003. Ecological Niches: Linking Classical and Contemporary Approaches. University of Chicago Press.
Chase, Jonathan M., Brian J. McGill, Daniel J. McGlinn, Felix May, Shane A. Blowes, Xiao Xiao, Tiffany M. Knight, Oliver Purschke, and Nicholas J. Gotelli. 2018. “Embracing Scale-Dependence to Achieve a Deeper Understanding of Biodiversity and Its Change Across Communities.” Ecology Letters 21 (11): 1737–51.
Hill, M. O. 1973. “Diversity and Evenness: A Unifying Notation and Its Consequences.” Ecology 54 (2): 427–32.
Hurlbert, Stuart H. 1971. “The Nonconcept of Species Diversity: A Critique and Alternative Parameters.” Ecology 52 (4): 577–86.
Jost, L. 2006. “Entropy and Diversity.” Oikos 113 (2): 363–75.
Jost, Lou. 2007. “Partitioning Diversity into Independent Alpha and Beta Components.” Ecology 88 (10): 2427–39.
McGill, Brian J. 2011a. “Linking Biodiversity Patterns by Autocorrelated Random Sampling.” American Journal of Botany 98 (3): 481–502.
———. 2011b. “Species Abundance Distributions.” In Biological Diversity: Frontiers In Measurement & Assessment (Eds Magurran, A. & McGill, B.), 105–22. Oxford University Press.
McGlinn, Daniel J., Xiao Xiao, Felix May, Nicholas J. Gotelli, Thore Engel, Shane A. Blowes, Tiffany M. Knight, Oliver Purschke, Jonathan M. Chase, and Brian J. McGill. 2019. “Measurement of Biodiversity (MoB): A Method to Separate the Scale-Dependent Effects of Species Abundance Distribution, Density, and Aggregation on Diversity Change.” Methods in Ecology and Evolution 10 (2): 258–69.
Olszewski, Thomas D. 2004. “A Unified Mathematical Framework for the Measurement of Richness and Evenness Within and Among Multiple Communities.” Oikos 104 (2): 377–87.
Rosenzweig, Michael L. 1995. Species Diversity in Space and Time. Cambridge University Press.
Whittaker, Robert Harding. 1960. “Vegetation of the Siskiyou Mountains, Oregon and California.” Ecological Monographs 30 (3): 279–338.